1 Setup

1.1 Packages and Functions

The code below simply cleans your environment to avoid loading unnecessary functions or variables and loads the libraries used in our script. We begin by installing and loading the required packages. For BDA, we use mainly Bürkner (2020), whereas Gabry and Mahr (2020) provides support with various plots and functions to calculate credible intervals.

rm( list = ls() )  # Cleans the environment.
# You can install packages in case they are not installed already.
# In case you don't have rstan installed, see:
# https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
# install.packages(c("tidyverse", "skimr", "ggthemes", "patchwork", "ggdag",
#                     "dagitty", "brms", "loo", "bayesplot"))
library(tidyverse) # For transforming and visualizing data.
library(skimr)     # For getting data summaries
library(ggthemes)  # Themes for ggplot
ggplot2::theme_set(theme_tufte())
library(patchwork) # Combining many plots into the same figure.

library(ggdag)     # For DAG analysis
library(dagitty)   # For DAG analysis

library(brms)      # BDA packages. Alternatively, one can use rethinking & rstanarm.
library(loo)       # For comparing different models' performance
library(bayesplot) # Plotting BDA output by Gabry et al.
bayesplot::color_scheme_set("viridis") #Uses the viridis palette on bayesplots

For reproducibility and efficiency we set an arbitrary seed, sampling parameters and the number of cores to speed up the MCMC sampling.

SAMPLES = 5000
WARMUP = 1000
CHAINS = 4
SEED = 2020
DELTA = 0.99
TREE = 13
set.seed(SEED)
options(mc.cores = parallel::detectCores())

2 Overview of the Dataset

First we load the data and take a look at it. As this report is analyzing the performance differences between Bugspots and Linespots, we focus only on those variables relevant to runtime.

d = read_delim(
  '../data/full_evaluation.csv',
  delim = ",",
  locale = locale(decimal_mark = "."),
  col_names = TRUE,
  col_types = cols(
    AUCEC1 = col_double(),
    AUCEC100 = col_double(),
    AUCEC20 = col_double(),
    AUCEC5 = col_double(),
    AUROC = col_double(),
    Algorithm = col_factor(),
    Depth = col_double(),
    EInspect10 = col_double(),
    EInspect100 = col_double(),
    EInspect200 = col_double(),
    EInspect50 = col_double(),
    EInspectF = col_double(),
    EXAM = col_double(),
    FixCount = col_double(),
    Future = col_double(),
    LOC = col_double(),
    Origin = col_double(),
    Project = col_factor(),
    Runtime = col_double(),
    Time = col_factor(),
    Weight = col_factor(),
    comit_version = col_factor(),
    commits = col_double(),
    language = col_factor(),
    url = col_factor()
  )
)
# There seem to be some cases where no faults were found in the pseudo future.
# As those cases can't tell us anything, we will remove them.
d = subset(d, d$FixCount != 0)
d$EInspectFLong = ceiling(d$EInspectF * d$LOC)

# For clarity, we only keep the variables we later use.
d = data.frame("Algorithm" =  d$Algorithm, "LOC" = d$LOC, "FixCount" = d$FixCount,
               "Project" = d$Project, "language" = d$language, "AUCEC1" = d$AUCEC1,
               "AUCEC5" = d$AUCEC5, "AUROC" = d$AUROC, "EInspect10" = d$EInspect10,
               "EInspect100" = d$EInspect100, "EInspectF" = d$EInspectFLong, "EXAM" = d$EXAM)

Descriptive Statistics

Table 2.1: Data summary
Name d
Number of rows 480
Number of columns 12
_______________________
Column type frequency:
factor 3
numeric 9
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Algorithm 0 1 FALSE 2 Lin: 240, Bug: 240
Project 0 1 FALSE 32 mys: 18, woo: 18, pre: 18, ser: 18
language 0 1 FALSE 8 Jav: 120, Jav: 72, Typ: 66, C: 54

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
LOC 0 1 1678978.99 3027324.25 21182.00 269496.00 624173.00 1565072.00 20705587.00 ▇▁▁▁▁
FixCount 0 1 91.80 72.35 7.00 40.75 69.00 115.00 418.00 ▇▃▁▁▁
AUCEC1 0 1 0.08 0.06 0.00 0.04 0.07 0.11 0.35 ▇▆▂▁▁
AUCEC5 0 1 0.20 0.10 0.00 0.12 0.19 0.26 0.56 ▅▇▅▂▁
AUROC 0 1 0.02 0.03 0.00 0.00 0.01 0.02 0.16 ▇▁▁▁▁
EInspect10 0 1 0.10 0.34 0.00 0.00 0.00 0.00 2.00 ▇▁▁▁▁
EInspect100 0 1 0.67 1.30 0.00 0.00 0.00 1.00 8.00 ▇▁▁▁▁
EInspectF 0 1 2993.67 10293.40 1.00 63.00 235.00 1160.00 112581.00 ▇▁▁▁▁
EXAM 0 1 0.22 0.06 0.08 0.18 0.22 0.26 0.38 ▂▅▇▃▁

Distributions (Histograms)

Scaled Distributions (Histograms)

With both LOC and FixCount spanning multiple orders of magnitude, we standardize them to improve sampling performance.

d$LOC = scale(d$LOC)
d$FixCount = scale(d$FixCount)


3 DAG Analysis

Based on the the data we gathered, we built a DAG, representing the causal relationships as we assume them. In this case, the analysis is rather simple. We designed the experiment in such a way, that there is no incoming causal relationship to algorithm so we could use all parameters without confounding problems.

DAG

Runtime_dag <- dagify(
  Project ~ Language,
  LOC ~ Project,
  FixCount ~ Project,
  EvaluationMetrics ~ Algorithm + Project + LOC + FixCount + Language,
  exposure = "Algorithm",
  outcome = "EvaluationMetrics",
  labels = c(
    "Project" = "Project",
    "Language" = "Language",
    "LOC" = "LOC",
    "FixCount" = "Fix\nCount",
    "Algorithm" = "Algorithm",
    "EvaluationMetrics" = "Evaluation\nMetrics"
  )
)

ggdag(Runtime_dag, text = FALSE, use_labels = "label", layout="circle") +
  theme_dag()

***

Causal Paths

The graph shows that there is only a single possible causal path from ‘Algorithm’ to ‘Evaluation Metrics’, so regardless of which other predictor we add to a model, they will not add bias or confounding.

ggdag_paths(Runtime_dag,
            text = FALSE,
            use_labels = "label",
            shadow = TRUE,adjust_for = c("LOC", "Project", "Language", "FixCount"),
            layout="circle") +
  theme_dag()


Adjustment Sets

Finally, we can test the three sets of predictors we plan on using for being adjustment sets.

isAdjustmentSet(Runtime_dag, c("LOC"))
isAdjustmentSet(Runtime_dag, c("LOC", "FixCount"))
isAdjustmentSet(Runtime_dag, c("LOC", "FixCount", "Project"))
isAdjustmentSet(Runtime_dag, c("LOC", "FixCount", "Project", "Language"))
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE

4 BDA workflow

We are interested in the evaluation metrics as our outcome, Algorithm as our exposure and we control for LOC, Project as well as language. We then build generalized linear models (GLM) for different predictor combinations and compare them using psis-loo.

For the intercept priors we used insights from past research, either our own or from others in the field. The remaining priors were chosen in such a way that the pp_check graphs show models allowing values outside of the data range to prevent overfitting.

4.1 AUROC Models

For the area under the ROC curve we use a beta likelihood, as it represents the ratio of the union square that is under the ROC curve.

While we do not have past experience with AUROC values for these algorithms, we know from our past work that even in the early parts of the result lists the precision and recall are very low for both. To represent this we set the intercept prior to 0.1 or roughly -2 on the logit scale.

As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.

M3

\[ \begin{split} \mathcal{M}_3: \mathrm{AUROC_i} \thicksim &\ \mathrm{Beta}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F\mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-2, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.25)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for\ }p = 1..32 \\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal(50, 20)} \end{split} \]

mauroc3p = brm(
  formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mauroc3p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mauroc3= brm(
  formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

The comparison shows equal loo performance for M3 and M4. As M3 is the simpler model, we choose it as our candidate model.

Results:

summary(mauroc3)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.64      0.10     0.47     0.86 1.00     3517     6331
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -4.09      0.13    -4.35    -3.82 1.00     2953     4717
## AlgorithmBugspots    -0.35      0.08    -0.51    -0.19 1.00    16001    12322
## LOC                  -0.15      0.09    -0.33     0.04 1.00     7034    10077
## FixCount              0.14      0.05     0.05     0.23 1.00    12482    11552
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    40.27      3.37    33.91    47.10 1.00    13338    11445
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The posterior predictive check shows a good fit.

pp_check(mauroc3, nsamples = 100) + scale_y_continuous(trans='sqrt')

On the logit scale the effect of Bugspots is well negative with no 0 overlap at all. On the outcome scale, there is some overlap between the two algorithms, but the respective means are outside of the 95% intervals shown and there is a clear tendency for Linespots to have higher AUROC values than Bugspots

mcmc_areas(mauroc3, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("AUROC Posterior Distribution\n With 95% intervals")

eff = conditional_effects(mauroc3, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look well.

min(neff_ratio(mauroc3))
## [1] 0.1839649
max(rhat(mauroc3))
## [1] 1.000528
plot(mauroc3)

M1

mauroc1p = brm(
  formula = AUROC ~ 1 + Algorithm + LOC,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mauroc1p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mauroc1= brm(
  formula = AUROC ~ 1 + Algorithm + LOC,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mauroc1, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(mauroc1)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: AUROC ~ 1 + Algorithm + LOC 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -3.95      0.08    -4.10    -3.80 1.00     8852     8956
## AlgorithmBugspots    -0.35      0.08    -0.51    -0.18 1.00    10228    10003
## LOC                  -0.22      0.05    -0.33    -0.12 1.00     9609     9029
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    25.69      2.21    21.51    30.26 1.00     7420     8287
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M2

mauroc2p = brm(
  formula = AUROC ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mauroc2p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mauroc2= brm(
  formula = AUROC ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mauroc2, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(mauroc2)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: AUROC ~ 1 + Algorithm + LOC + FixCount 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -3.98      0.08    -4.14    -3.83 1.00     9972    10655
## AlgorithmBugspots    -0.30      0.08    -0.46    -0.14 1.00    12215    10808
## LOC                  -0.23      0.06    -0.35    -0.13 1.00    11026    10413
## FixCount              0.15      0.04     0.07     0.23 1.00    12624    10608
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    26.38      2.26    22.18    31.00 1.00     9021     9805
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

mauroc4p = brm(
  formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mauroc4p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mauroc4= brm(
  formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
summary(mauroc4)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.54      0.23     0.18     1.08 1.00     4388     5167
## 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.52      0.10     0.36     0.75 1.00     5059     8323
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -4.10      0.25    -4.55    -3.57 1.00     5965     6573
## AlgorithmBugspots    -0.36      0.08    -0.52    -0.20 1.00    20536    12478
## LOC                  -0.11      0.09    -0.29     0.07 1.00    12468    11924
## FixCount              0.14      0.05     0.05     0.23 1.00    16167    12004
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    40.44      3.40    34.06    47.41 1.00    15995    10420
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(mauroc4, nsamples = 100) + scale_y_continuous(trans='sqrt')

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_mauroc1 = loo(mauroc1)
loo_mauroc2 = loo(mauroc2)
loo_mauroc3 = loo(mauroc3)
loo_mauroc4 = loo(mauroc4)
loo_compare(loo_mauroc1, loo_mauroc2, loo_mauroc3, loo_mauroc4)
##         elpd_diff se_diff
## mauroc4   0.0       0.0  
## mauroc3  -0.4       1.1  
## mauroc2 -65.6      11.3  
## mauroc1 -70.6      11.5

4.2 EXAM Models

For the EXAM score we again use a beta likelihood, as it represents the ratio of LOC one has to inspect to find a fault, averaged across all faults.

In our past work the mean EXAM value was around 0.226 or around -1 on the logit scale.

As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.

M3

\[ \begin{split} \mathcal{M}_3: \mathrm{EXAM_i} \thicksim &\ \mathrm{Beta}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]}\\ \alpha \thicksim &\ \mathrm{Normal(-1, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.15)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal(50, 20)} \end{split} \]

mexam3p = brm(
  formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam3p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mexam3= brm(
  formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

Results:

summary(mexam3)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.23      0.03     0.18     0.31 1.00     3017     4487
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -1.33      0.04    -1.42    -1.24 1.00     2123     4311
## AlgorithmBugspots     0.09      0.02     0.05     0.14 1.00    17942    12321
## LOC                  -0.07      0.03    -0.12    -0.01 1.00     7712    10194
## FixCount              0.01      0.02    -0.02     0.04 1.00    13109    12071
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    86.15      5.52    75.73    97.38 1.00    15292    12372
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The posterior predictive check shows a good overall fit.

pp_check(mexam3, nsamples = 100) + scale_y_continuous(trans='sqrt')

On the logit scale the effect of Bugspots is well positive with no 0 overlap. On the outcome scale, there is some overlap between the two algorithms, but the respective means are outside of the 95% intervals shown and there is a clear tendency for Linespots to have lower EXAM values than Bugspots

mcmc_areas(mexam3, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EXAM Posterior Distribution\n With 95% intervals")

eff = conditional_effects(mexam3, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look good.

min(neff_ratio(mexam3))
## [1] 0.1323598
max(rhat(mexam3))
## [1] 1.001445
plot(mexam3)

M1

mexam1p = brm(
  formula = EXAM ~ 1 + Algorithm + LOC,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam1p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mexam1 = brm(
  formula = EXAM ~ 1 + Algorithm + LOC,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam1, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(mexam1)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: EXAM ~ 1 + Algorithm + LOC 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -1.30      0.02    -1.34    -1.26 1.00    14865    10711
## AlgorithmBugspots     0.09      0.03     0.03     0.15 1.00    14623    10199
## LOC                  -0.03      0.02    -0.06     0.00 1.00    14838    11222
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    51.09      3.23    44.97    57.64 1.00    14239    11189
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M2

mexam2p = brm(
  formula = EXAM ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam2p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mexam2= brm(
  formula = EXAM ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam2, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(mexam2)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: EXAM ~ 1 + Algorithm + LOC + FixCount 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -1.30      0.02    -1.34    -1.26 1.00    15010    10648
## AlgorithmBugspots     0.09      0.03     0.03     0.15 1.00    16359    11268
## LOC                  -0.03      0.02    -0.06     0.00 1.00    17271    11433
## FixCount             -0.01      0.01    -0.04     0.02 1.00    17500    11478
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    51.01      3.28    44.76    57.69 1.00    17602    10996
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

mexam4p = brm(
  formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam4p, nsamples = 100) + scale_y_continuous(trans='sqrt')

mexam4= brm(
  formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = Beta(),
  prior = c(
    prior(normal(-1, 1), class=Intercept),
    prior(normal(0, 0.15), class=b),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mexam4, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(mexam4)
##  Family: beta 
##   Links: mu = logit; phi = identity 
## Formula: EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.23      0.11     0.08     0.50 1.00     4199     5616
## 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.19      0.03     0.13     0.26 1.00     4741     8248
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -1.29      0.10    -1.48    -1.08 1.00     6814     7048
## AlgorithmBugspots     0.09      0.02     0.05     0.14 1.00    25166    11031
## LOC                  -0.07      0.03    -0.12    -0.02 1.00    14248    12899
## FixCount              0.00      0.02    -0.03     0.03 1.00    20418    12551
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    86.23      5.51    75.75    97.33 1.00    21395    11079
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_mauroc1 = loo(mexam1)
loo_mexam2 = loo(mexam2)
loo_mexam3 = loo(mexam3)
loo_mexam4 = loo(mexam4)
loo_compare(loo_mauroc1, loo_mauroc2, loo_mauroc3, loo_mauroc4)
##         elpd_diff se_diff
## mauroc4     0.0       0.0
## mauroc3    -0.4       1.1
## mauroc2   -65.6      11.3
## mexam1  -1103.3      51.3

4.3 EInspectF Models

For the EInspectF we use a negative binomial likelihood, as it is a count outcome We also considered a poisson, but the difference in mean and variance makes the negative binomial the better candidate.

We do not have past experience for this value but based on the results of zouEmpiricalStudyFault2019 we use a prior mean of 500 LOC or roughly 6 on the log scale.

M3

\[ \begin{split} \mathcal{M}_3: \mathrm{EInspectF_i} \thicksim &\ \mathrm{NegBinom}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-3, 0.5)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.3)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \phi \thicksim &\ \mathrm{Weibull}(2, 1) \\ \end{split} \]

mEInspectF3p = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF3p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspectF3 = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

Results:

summary(mEInspectF3)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.40      0.19     1.08     1.80 1.00     3210     5397
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             5.70      0.25     5.20     6.19 1.00     2012     3290
## AlgorithmBugspots     1.45      0.13     1.19     1.71 1.00    14879    11538
## LOC                   0.02      0.15    -0.27     0.32 1.00     8735    10682
## FixCount             -0.57      0.08    -0.72    -0.42 1.00    10648    10574
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.62      0.03     0.56     0.69 1.00    15590    11175
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The comparison shows very similar LOO performance for M3 and M4. We chose M3 as our final model as it shows very good sampling behavior and because it is the smaller model of the two.

The posterior predictive check shows a good overall fit.

pp_check(mEInspectF3, nsamples = 100) + scale_x_continuous(trans='log10')

The 95% interval of the effect of ‘Bugspots’ on the logit scale is completely negative, however there is some overlap with 0 further in the tail. On the outcome scale, the overlap is big between the two algorithms. While the means and lower bounds are close, the upper bound for Linespots is higher than for Bugspots

mcmc_areas(mEInspectF3, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EInspectF Posterior Distribution\n With 95% intervals")

eff = conditional_effects(mEInspectF3, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look good.

min(neff_ratio(mEInspectF3))
## [1] 0.127741
max(rhat(mEInspectF3))
## [1] 1.002961
plot(mEInspectF3)

M1

mEInspectF1p = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF1p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspectF1 = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF1, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspectF1)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspectF ~ 1 + Algorithm + LOC 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             6.91      0.11     6.69     7.14 1.00    14364    10960
## AlgorithmBugspots     1.44      0.16     1.13     1.75 1.00    13600    11402
## LOC                   1.10      0.15     0.82     1.39 1.00    13918    10434
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.34      0.02     0.30     0.37 1.00    13645    10606
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M2

mEInspectF2p = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF2p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspectF2 = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC +  FixCount,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF2, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspectF2)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspectF ~ 1 + Algorithm + LOC + FixCount 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             6.45      0.10     6.26     6.66 1.00    17040    11248
## AlgorithmBugspots     1.59      0.15     1.30     1.87 1.00    15349    10498
## LOC                   1.32      0.14     1.05     1.60 1.00    16656     9711
## FixCount             -0.95      0.06    -1.07    -0.83 1.00    16382    11563
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.40      0.02     0.36     0.45 1.00    16242    11779
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

mEInspectF4p = brm(
  formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF4p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspectF4= brm(
  formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(6, 1), class=Intercept),
    prior(normal(0, 1), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspectF4, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspectF4)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.00      0.32     0.42     1.71 1.00     6544     6112
## 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.13      0.18     0.83     1.53 1.00     5662     9674
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             5.73      0.40     4.91     6.51 1.00     9649    10245
## AlgorithmBugspots     1.46      0.13     1.20     1.72 1.00    22342    11553
## LOC                  -0.04      0.15    -0.32     0.25 1.00    17370    11623
## FixCount             -0.55      0.07    -0.70    -0.41 1.00    19721    12519
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.62      0.04     0.56     0.70 1.00    20939    10976
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_mEInspectF1 = loo(mEInspectF1)
loo_mEInspectF2 = loo(mEInspectF2)
loo_mEInspectF3 = loo(mEInspectF3)
loo_mEInspectF4 = loo(mEInspectF4)
loo_compare(loo_mEInspectF1, loo_mEInspectF2, loo_mEInspectF3, loo_mEInspectF4)
##             elpd_diff se_diff
## mEInspectF4    0.0       0.0 
## mEInspectF3   -1.0       1.1 
## mEInspectF2 -140.0      16.0 
## mEInspectF1 -208.1      19.9

4.4 EInspect10 Models

For the EInspect10 we use a negative binomial likelihood, as it is a count outcome. We also considered a poisson, but the difference in mean and variance makes the negative binomial the better candidate.

We do not have experience with both algorithms for this metric, but zouEmpiricalStudyFault2019 gives an EInspect value of 0 for Bugspots. Based on that we set a low prior of 0.01 or roughly -4 on the logit scale.

M3

\[ \begin{split} \mathcal{M}_3: \mathrm{EInspect10_i} \thicksim &\ \mathrm{NegBinom}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-3, 0.5)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.3)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \phi \thicksim &\ \mathrm{Weibull}(2, 1) \\ \end{split} \]

mEInspect103p = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect103p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect103 = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

Results:

summary(mEInspect103)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.33      0.32     0.75     2.02 1.00     5254     8321
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -3.04      0.35    -3.77    -2.40 1.00     7231    10822
## AlgorithmBugspots    -0.40      0.20    -0.79    -0.01 1.00    26087    11934
## LOC                  -0.16      0.20    -0.56     0.24 1.00    15861    12654
## FixCount              0.13      0.14    -0.14     0.39 1.00    15648    12451
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.31      0.42     0.61     2.22 1.00    22923    11028
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The comparison shows equal LOO performance for M3 and M4. We chose M3 as our candidate model, as it is the smaller one.

The posterior predictive check shows that the model fits the data well with only a slightly longer tail.

pp_check(mEInspect103, nsamples = 100) + scale_x_continuous(trans='log10')

The 95% interval of the effect of ‘Bugspots’ on the logit scale is completely negative, however there is overlap with 0 in the tail. On the outcome scale, the overlap is big between the two algorithms. While the means and lower bounds are close, the upper bound for Linespots is higher than for Bugspots

mcmc_areas(mEInspect103, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EInspect10 Posterior Distribution\n With 95% intervals")

eff = conditional_effects(mEInspect103, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look good.

min(neff_ratio(mEInspect103))
## [1] 0.2338744
max(rhat(mEInspect103))
## [1] 1.000227
plot(mEInspect103)

M1

mEInspect101p = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect101p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect101 = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.5), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect101, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspect101)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect10 ~ 1 + Algorithm + LOC 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -2.17      0.19    -2.55    -1.81 1.00    13976    10777
## AlgorithmBugspots    -0.82      0.28    -1.38    -0.28 1.00    12353    10943
## LOC                  -0.49      0.26    -1.04    -0.03 1.00    11104     8599
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.07      0.41     0.43     2.01 1.00    11858     9940
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M2

mEInspect102p = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect102p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect102 = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect102, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspect102)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect10 ~ 1 + Algorithm + LOC + FixCount 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -2.31      0.18    -2.66    -1.97 1.00    17396    11372
## AlgorithmBugspots    -0.42      0.20    -0.81    -0.03 1.00    16910    11519
## LOC                  -0.29      0.17    -0.65     0.02 1.00    15694    10689
## FixCount              0.22      0.12    -0.02     0.44 1.00    15929    10689
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.07      0.41     0.43     2.00 1.00    15902    10390
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

mEInspect104p = brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.3), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect104p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect104= brm(
  formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-4, 0.5), class=Intercept),
    prior(normal(0, 0.3), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect104, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspect104)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.11      0.37     0.45     1.92 1.00     7021     6489
## 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.87      0.35     0.24     1.60 1.00     4181     5741
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -3.28      0.40    -4.10    -2.52 1.00    10753    12239
## AlgorithmBugspots    -0.49      0.22    -0.92    -0.06 1.00    21691    11280
## LOC                  -0.11      0.23    -0.58     0.35 1.00    16340    12736
## FixCount              0.13      0.14    -0.15     0.39 1.00    18232    12427
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.31      0.42     0.60     2.24 1.00    24336    10413
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_mEInspect101 = loo(mEInspect101)
loo_mEInspect102 = loo(mEInspect102)
loo_mEInspect103 = loo(mEInspect103)
loo_mEInspect104 = loo(mEInspect104)
loo_compare(loo_mEInspect101, loo_mEInspect102, loo_mEInspect103, loo_mEInspect104)
##              elpd_diff se_diff
## mEInspect104   0.0       0.0  
## mEInspect103  -0.8       1.5  
## mEInspect101 -13.8       4.3  
## mEInspect102 -14.0       4.0

4.5 EInspect100 Models

For the EInspect100 we use a negative binomial likelihood, as it is a count outcome. We also considered a poisson, but the difference in mean and variance makes the negative binomial the better candidate.

We do not have past experience with this metric so we will use the prior for the EInspect10 models and multiply it by 5. We do not multiply by 10 as we expect more faults but based on the kind of ranking, we do not expect the growth to be linear. An intercept mean of 0.05 translates to -3 on the logit scale.

M3

\[ \begin{split} \mathcal{M}_3: \mathrm{EInspect100_i} \thicksim &\ \mathrm{NegBinom}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-3, 0.5)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.3)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \phi \thicksim &\ \mathrm{Weibull}(2, 1) \\ \end{split} \]

mEInspect1003p = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1003p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect1003 = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

Results:

summary(mEInspect1003)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.18      0.24     0.78     1.71 1.00     3537     6382
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -1.11      0.27    -1.68    -0.64 1.00     3411     5643
## AlgorithmBugspots    -0.47      0.13    -0.73    -0.21 1.00    17516    11161
## LOC                  -0.23      0.16    -0.55     0.08 1.00    10244    11322
## FixCount              0.23      0.08     0.08     0.40 1.00    12401    12197
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.37      0.28     0.91     1.98 1.00    16274    11240
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The comparison shows very similar LOO performance for M3 and M4. We chose M3 as our final model as it shows very good sampling behavior and because it is the smaller model of the two.

The posterior predictive check shows a good overall fit.

pp_check(mEInspect1003, nsamples = 100) + scale_x_continuous(trans='log10')

The 95% interval of the effect of ‘Bugspots’ on the logit scale is completely negative, however there is some overlap with 0 further in the tail. On the outcome scale, the overlap is big between the two algorithms. While the means and lower bounds are close, the upper bound for Linespots is higher than for Bugspots

mcmc_areas(mEInspect1003, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EInspect100 Posterior Distribution\n With 95% intervals")

eff = conditional_effects(mEInspect1003, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look good.

min(neff_ratio(mEInspect1003))
## [1] 0.1807117
max(rhat(mEInspect1003))
## [1] 1.001631
plot(mEInspect1003)

M1

mEInspect1001p = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1001p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect1001 = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1001, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspect1001)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect100 ~ 1 + Algorithm + LOC 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -0.39      0.11    -0.60    -0.17 1.00    13003    11232
## AlgorithmBugspots    -0.35      0.14    -0.62    -0.07 1.00    13559    10168
## LOC                  -0.43      0.12    -0.69    -0.20 1.00    14109    10594
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.56      0.09     0.40     0.76 1.00    13854    10803
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M2

mEInspect1002p = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1002p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect1002 = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount,
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1002, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspect1002)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect100 ~ 1 + Algorithm + LOC + FixCount 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -0.49      0.11    -0.70    -0.29 1.00    16002    10707
## AlgorithmBugspots    -0.35      0.14    -0.62    -0.07 1.00    16079    10467
## LOC                  -0.49      0.13    -0.74    -0.25 1.00    14761    11725
## FixCount              0.43      0.07     0.28     0.57 1.00    15608    10919
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.72      0.13     0.50     1.01 1.00    14112    10454
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

mEInspect1004p = brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1004p, nsamples = 100) + scale_x_continuous(trans='log10')

mEInspect1004= brm(
  formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
  data = d,
  family = negbinomial(),
  prior = c(
    prior(normal(-3, 0.5), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(weibull(2, 1), class=sd),
    prior(weibull(2, 1), class=shape)
  ),
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(mEInspect1004, nsamples = 100) + scale_x_continuous(trans='log10')

summary(mEInspect1004)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.04      0.36     0.42     1.83 1.00     5569     5802
## 
## ~Project (Number of levels: 32) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.87      0.23     0.49     1.39 1.00     4083     7817
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept            -1.59      0.42    -2.48    -0.84 1.00     7030     9277
## AlgorithmBugspots    -0.47      0.13    -0.73    -0.21 1.00    25809    11554
## LOC                  -0.10      0.16    -0.43     0.21 1.00    16537    12260
## FixCount              0.24      0.08     0.08     0.39 1.00    18753    12152
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.36      0.27     0.91     1.97 1.00    23146    12396
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_mEInspect1001 = loo(mEInspect1001)
loo_mEInspect1002 = loo(mEInspect1002)
loo_mEInspect1003 = loo(mEInspect1003)
loo_mEInspect1004 = loo(mEInspect1004)
loo_compare(loo_mEInspect1001, loo_mEInspect1002, loo_mEInspect1003, loo_mEInspect1004)
##               elpd_diff se_diff
## mEInspect1003   0.0       0.0  
## mEInspect1004  -1.2       1.2  
## mEInspect1002 -43.8       8.3  
## mEInspect1001 -60.5      10.1

4.6 AUCEC1 Models

For the area under the cost-effectiveness curve at 0.01 LOC we use a zero-inflated beta likelihood. The AUCEC values are all normalized to their respective optimal value which puts them between 0 and 1. However, the low cut-off leads to some 0 results, which we need the zero-inflation for.

We do not have past experience with this metric but based on arisholmSystematicComprehensiveInvestigation2010 we use an intercept prior of 0.1 or roughly -2 on the logit scale.

As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.

M2

\[ \begin{split} \mathcal{M}_4: \mathrm{AUCEC1_i} \thicksim &\ \mathrm{ZIBeta}(\mu_i, \phi, \lambda_i) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i \\ \mathrm{logit}(\lambda)_i = &\ \beta_{ziA} \mathrm{Algorithm}_i + \alpha_{ziProject[i]} \\ \alpha \thicksim &\ \mathrm{Normal(0, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.25)} \\ \beta_{ziA} \thicksim &\ Normal(0, 2)\\ \alpha_{zip} \thicksim &\ \mathrm{Logistic}(0, 1)\ \ \ \mathrm{for}\ zip = 1..32 \\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal(50, 20)} \\ \end{split} \]

maucec12p = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount, zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec12p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec12 = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

Results:

summary(maucec12)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.41      0.07     0.30     0.56 1.00     4212     7350
## sd(zi_Intercept)     2.28      0.62     1.32     3.74 1.00     5235     8181
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -2.17      0.08    -2.33    -2.01 1.00     3493
## zi_Intercept            -5.39      0.80    -7.14    -4.00 1.00     8024
## AlgorithmBugspots       -0.49      0.05    -0.59    -0.39 1.00    23185
## LOC                      0.25      0.05     0.15     0.34 1.00     9471
## FixCount                -0.07      0.03    -0.13    -0.01 1.00    17302
## zi_AlgorithmBugspots     2.05      0.52     1.10     3.15 1.00    22102
##                      Tail_ESS
## Intercept                6434
## zi_Intercept             8985
## AlgorithmBugspots       12418
## LOC                     10836
## FixCount                13004
## zi_AlgorithmBugspots    11493
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    39.24      2.75    33.98    44.84 1.00    18771    11813
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The posterior predictive check shows a good fit.

pp_check(maucec12, nsamples = 100) + scale_y_continuous(trans='sqrt')

The effect of Bugspots on the logit scale is entirely negative with no 0 overlap. On the outcome scale, there is no overlap between the two algorithms with Linespots consistently having higher values than Bugspots

mcmc_areas(maucec12, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("AUCEC1 Posterior Distribution\n With 95% intervals")

eff = conditional_effects(maucec12, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look good.

min(neff_ratio(maucec12))
## [1] 0.2046774
max(rhat(maucec12))
## [1] 1.001567
plot(maucec12)

M1

maucec11p = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)
  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec11p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec11 = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec11, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(maucec11)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC1 ~ 1 + Algorithm + LOC 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(zi_Intercept)     2.29      0.63     1.33     3.78 1.00     4763     7661
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -2.13      0.04    -2.21    -2.06 1.00    21858
## zi_Intercept            -5.39      0.80    -7.17    -3.99 1.00     6962
## AlgorithmBugspots       -0.45      0.06    -0.57    -0.34 1.00    18373
## LOC                      0.10      0.03     0.05     0.15 1.00    19145
## zi_AlgorithmBugspots     2.06      0.53     1.09     3.16 1.00    18603
##                      Tail_ESS
## Intercept               12111
## zi_Intercept             7907
## AlgorithmBugspots       11573
## LOC                     10963
## zi_AlgorithmBugspots    11734
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    26.67      1.85    23.16    30.42 1.00    15799    11385
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M3

maucec13p = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec13p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec13 = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec13, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(maucec13)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.42      0.07     0.31     0.58 1.00     3692     5914
## sd(zi_Intercept)     2.28      0.61     1.32     3.74 1.00     5230     7440
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -2.18      0.08    -2.34    -2.02 1.00     3382
## zi_Intercept            -5.38      0.79    -7.07    -3.99 1.00     7531
## AlgorithmBugspots       -0.49      0.05    -0.59    -0.39 1.00    19323
## LOC                      0.25      0.05     0.15     0.35 1.00     7594
## FixCount                -0.07      0.03    -0.13    -0.01 1.00    13525
## zi_AlgorithmBugspots     2.06      0.53     1.08     3.14 1.00    17396
##                      Tail_ESS
## Intercept                6064
## zi_Intercept             8299
## AlgorithmBugspots       11761
## LOC                      9947
## FixCount                11882
## zi_AlgorithmBugspots    10806
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    39.30      2.72    34.13    44.78 1.00    16522    11712
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

maucec14p = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec14p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec14 = brm(
  bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec14, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(maucec14)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.37      0.17     0.11     0.79 1.00     4355     4967
## 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.36      0.07     0.25     0.51 1.00     5259     8958
## sd(zi_Intercept)     2.27      0.63     1.30     3.77 1.00     5397     8797
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -2.20      0.16    -2.53    -1.87 1.00     6942
## zi_Intercept            -5.37      0.79    -7.08    -3.97 1.00     8757
## AlgorithmBugspots       -0.49      0.05    -0.59    -0.40 1.00    29558
## LOC                      0.28      0.05     0.18     0.37 1.00    14255
## FixCount                -0.07      0.03    -0.14    -0.01 1.00    22614
## zi_AlgorithmBugspots     2.05      0.53     1.08     3.15 1.00    25097
##                      Tail_ESS
## Intercept                7564
## zi_Intercept             9938
## AlgorithmBugspots       11591
## LOC                     11953
## FixCount                12556
## zi_AlgorithmBugspots    11732
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    39.48      2.75    34.27    45.08 1.00    25254    11685
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_maucec11 = loo(maucec11)
loo_maucec12 = loo(maucec12)
loo_maucec13 = loo(maucec13)
loo_maucec14 = loo(maucec14)
loo_compare(loo_maucec11, loo_maucec12, loo_maucec13, loo_maucec14)
##          elpd_diff se_diff
## maucec14   0.0       0.0  
## maucec13  -1.1       1.2  
## maucec12  -1.1       1.2  
## maucec11 -69.3      10.5

4.7 AUCEC5 Models

For the area under the cost-effectiveness curve at 0.05 LOC we use a zero-inflated beta likelihood. The AUCEC values are all normalized to their respective optimal value which puts them between 0 and 1. However, the low cut-off leads to some 0 results, which we need the zero-inflation for.

We do not have past experience with this metric but based on arisholmSystematicComprehensiveInvestigation2010 we use an intercept prior of 0.1 or roughly -2 on the logit scale.

As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.

M2

\[ \begin{split} \mathcal{M}_2: \mathrm{AUCEC1_i} \thicksim &\ \mathrm{ZIBeta}(\mu_i, \phi, \lambda_i) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i\\ \mathrm{logit}(\lambda)_i = &\ \beta_{ziA} \mathrm{Algorithm}_i + \alpha_{ziProject[i]} \\ \alpha \thicksim &\ \mathrm{Normal(0, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.25)} \\ \beta_{ziA} \thicksim &\ Normal(0, 2)\\ \alpha_{zip} \thicksim &\ \mathrm{Logistic}(0, 1)\ \ \ \mathrm{for}\ zip = 1..32 \\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal}(50, 20) \\ \end{split} \]

maucec52p = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount, zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec52p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec52 = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)

Results:

summary(maucec52)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.44      0.07     0.33     0.59 1.00     3554     7184
## sd(zi_Intercept)     1.75      0.85     0.28     3.66 1.00     5717     3740
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -1.21      0.08    -1.37    -1.04 1.00     2347
## zi_Intercept            -7.33      1.45   -10.52    -4.94 1.00    11425
## AlgorithmBugspots       -0.45      0.04    -0.53    -0.37 1.00    27401
## LOC                      0.21      0.04     0.12     0.29 1.00     9703
## FixCount                -0.02      0.03    -0.07     0.04 1.00    17683
## zi_AlgorithmBugspots     1.84      1.09    -0.12     4.20 1.00    24559
##                      Tail_ESS
## Intercept                4270
## zi_Intercept            10342
## AlgorithmBugspots       11994
## LOC                     12021
## FixCount                12702
## zi_AlgorithmBugspots    10632
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    29.58      1.95    25.93    33.51 1.00    23528    11656
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The posterior predictive check shows a good fit.

pp_check(maucec52, nsamples = 100) + scale_y_continuous(trans='sqrt')

The effect of Bugspots on the logit scale is entirely negative with no 0 overlap. On the outcome scale, there is no overlap between the two algorithms with Linespots consistently having higher values than Bugspots

mcmc_areas(maucec52, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("AUCEC5 Posterior Distribution\n With 95% intervals")

eff = conditional_effects(maucec52, effects = c("Algorithm"))
eff$Algorithm
eff

And Diagnostics: All diagnostics look good.

min(neff_ratio(maucec52))
## [1] 0.1473713
max(rhat(maucec52))
## [1] 1.000866
plot(maucec52)

M1

maucec51p = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)
  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec51p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec51 = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec51, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(maucec51)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC5 ~ 1 + Algorithm + LOC 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(zi_Intercept)     1.75      0.86     0.25     3.70 1.00     5940     3996
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -1.18      0.04    -1.25    -1.11 1.00    27599
## zi_Intercept            -7.32      1.45   -10.54    -4.93 1.00    12209
## AlgorithmBugspots       -0.42      0.05    -0.52    -0.32 1.00    26109
## LOC                      0.07      0.03     0.02     0.12 1.00    33709
## zi_AlgorithmBugspots     1.84      1.09    -0.10     4.18 1.00    24736
##                      Tail_ESS
## Intercept               12331
## zi_Intercept             9832
## AlgorithmBugspots       11690
## LOC                     11132
## zi_AlgorithmBugspots    11076
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    16.33      1.04    14.36    18.41 1.00    28359    12427
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M3

maucec53p = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec53p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec53 = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec53, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(maucec53)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.45      0.07     0.34     0.60 1.00     3330     6214
## sd(zi_Intercept)     1.75      0.85     0.29     3.70 1.00     5430     3939
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -1.21      0.09    -1.37    -1.04 1.00     2021
## zi_Intercept            -7.32      1.45   -10.58    -4.93 1.00     9903
## AlgorithmBugspots       -0.46      0.04    -0.54    -0.37 1.00    22895
## LOC                      0.21      0.04     0.12     0.29 1.00     8400
## FixCount                -0.02      0.03    -0.07     0.03 1.00    15596
## zi_AlgorithmBugspots     1.83      1.08    -0.11     4.15 1.00    19862
##                      Tail_ESS
## Intercept                3568
## zi_Intercept             9832
## AlgorithmBugspots       12021
## LOC                     10652
## FixCount                12670
## zi_AlgorithmBugspots    11114
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    29.60      1.97    25.89    33.53 1.00    21058    11452
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

M4

maucec54p = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = "only",
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec54p, nsamples = 100) + scale_y_continuous(trans='sqrt')

maucec54 = brm(
  bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
  data = d,
  family = zero_inflated_beta(),
  prior = c(
    prior(normal(-2, 1), class=Intercept),
    prior(normal(0, 0.25), class=b),
    prior(normal(0, 2), class=b, dpar="zi"),
    prior(weibull(2, 1), class=sd),
    prior(normal(50, 20), class=phi)

  ),
  init = 0,
  iter = SAMPLES,
  warmup = WARMUP,
  chains = CHAINS,
  cores = parallel::detectCores(),
  sample_prior = FALSE,
  control = list(adapt_delta = DELTA, max_treedepth = TREE),
  seed = SEED
)
pp_check(maucec54, nsamples = 100) + scale_y_continuous(trans='sqrt')

summary(maucec54)
##  Family: zero_inflated_beta 
##   Links: mu = logit; phi = identity; zi = logit 
## Formula: AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language) 
##          zi ~ Algorithm + (1 | Project)
##    Data: d (Number of observations: 480) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~language (Number of levels: 8) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.40      0.17     0.14     0.81 1.00     4176     4326
## 
## ~Project (Number of levels: 32) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.37      0.07     0.26     0.51 1.00     4335     8732
## sd(zi_Intercept)     1.75      0.86     0.25     3.71 1.00     5047     3617
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept               -1.26      0.17    -1.62    -0.95 1.00     5319
## zi_Intercept            -7.32      1.47   -10.63    -4.92 1.00     9390
## AlgorithmBugspots       -0.46      0.04    -0.54    -0.38 1.00    27050
## LOC                      0.22      0.04     0.13     0.30 1.00    14566
## FixCount                -0.02      0.03    -0.07     0.04 1.00    19173
## zi_AlgorithmBugspots     1.83      1.10    -0.11     4.20 1.00    20555
##                      Tail_ESS
## Intercept                6750
## zi_Intercept             9574
## AlgorithmBugspots       12360
## LOC                     12457
## FixCount                12357
## zi_AlgorithmBugspots    11032
## 
## Family Specific Parameters: 
##     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi    29.65      1.99    25.88    33.64 1.00    20686    11503
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.

loo_maucec51 = loo(maucec51)
loo_maucec52 = loo(maucec52)
loo_maucec53 = loo(maucec53)
loo_maucec54 = loo(maucec54)
loo_compare(loo_maucec51, loo_maucec52, loo_maucec53, loo_maucec54)
##          elpd_diff se_diff
## maucec54    0.0       0.0 
## maucec53   -0.1       0.8 
## maucec52   -0.2       0.8 
## maucec51 -122.9      17.6

Bürkner, Paul-Christian. 2020. Brms: Bayesian Regression Models Using ’Stan’. https://CRAN.R-project.org/package=brms.

Gabry, Jonah, and Tristan Mahr. 2020. Bayesplot: Plotting for Bayesian Models. https://CRAN.R-project.org/package=bayesplot.